21 research outputs found
Neo: A Learned Query Optimizer
Query optimization is one of the most challenging problems in database
systems. Despite the progress made over the past decades, query optimizers
remain extremely complex components that require a great deal of hand-tuning
for specific workloads and datasets. Motivated by this shortcoming and inspired
by recent advances in applying machine learning to data management challenges,
we introduce Neo (Neural Optimizer), a novel learning-based query optimizer
that relies on deep neural networks to generate query executions plans. Neo
bootstraps its query optimization model from existing optimizers and continues
to learn from incoming queries, building upon its successes and learning from
its failures. Furthermore, Neo naturally adapts to underlying data patterns and
is robust to estimation errors. Experimental results demonstrate that Neo, even
when bootstrapped from a simple optimizer like PostgreSQL, can learn a model
that offers similar performance to state-of-the-art commercial optimizers, and
in some cases even surpass them
Bao: Learning to Steer Query Optimizers
Query optimization remains one of the most challenging problems in data
management systems. Recent efforts to apply machine learning techniques to
query optimization challenges have been promising, but have shown few practical
gains due to substantive training overhead, inability to adapt to changes, and
poor tail performance. Motivated by these difficulties and drawing upon a long
history of research in multi-armed bandits, we introduce Bao (the BAndit
Optimizer). Bao takes advantage of the wisdom built into existing query
optimizers by providing per-query optimization hints. Bao combines modern tree
convolutional neural networks with Thompson sampling, a decades-old and
well-studied reinforcement learning algorithm. As a result, Bao automatically
learns from its mistakes and adapts to changes in query workloads, data, and
schema. Experimentally, we demonstrate that Bao can quickly (an order of
magnitude faster than previous approaches) learn strategies that improve
end-to-end query execution performance, including tail latency. In cloud
environments, we show that Bao can offer both reduced costs and better
performance compared with a sophisticated commercial system
Learning Scheduling Algorithms for Data Processing Clusters
Efficiently scheduling data processing jobs on distributed compute clusters
requires complex algorithms. Current systems, however, use simple generalized
heuristics and ignore workload characteristics, since developing and tuning a
scheduling policy for each workload is infeasible. In this paper, we show that
modern machine learning techniques can generate highly-efficient policies
automatically. Decima uses reinforcement learning (RL) and neural networks to
learn workload-specific scheduling algorithms without any human instruction
beyond a high-level objective such as minimizing average job completion time.
Off-the-shelf RL techniques, however, cannot handle the complexity and scale of
the scheduling problem. To build Decima, we had to develop new representations
for jobs' dependency graphs, design scalable RL models, and invent RL training
methods for dealing with continuous stochastic job arrivals. Our prototype
integration with Spark on a 25-node cluster shows that Decima improves the
average job completion time over hand-tuned scheduling heuristics by at least
21%, achieving up to 2x improvement during periods of high cluster load
A Computational Approach to Packet Classification
Multi-field packet classification is a crucial component in modern
software-defined data center networks. To achieve high throughput and low
latency, state-of-the-art algorithms strive to fit the rule lookup data
structures into on-die caches; however, they do not scale well with the number
of rules. We present a novel approach, NuevoMatch, which improves the memory
scaling of existing methods. A new data structure, Range Query Recursive Model
Index (RQ-RMI), is the key component that enables NuevoMatch to replace most of
the accesses to main memory with model inference computations. We describe an
efficient training algorithm that guarantees the correctness of the
RQ-RMI-based classification. The use of RQ-RMI allows the rules to be
compressed into model weights that fit into the hardware cache. Further, it
takes advantage of the growing support for fast neural network processing in
modern CPUs, such as wide vector instructions, achieving a rate of tens of
nanoseconds per lookup. Our evaluation using 500K multi-field rules from the
standard ClassBench benchmark shows a geometric mean compression factor of
4.9x, 8x, and 82x, and average performance improvement of 2.4x, 2.6x, and 1.6x
in throughput compared to CutSplit, NeuroCuts, and TupleMerge, all
state-of-the-art algorithms.Comment: To appear in SIGCOMM 202
Flow-Loss: Learning Cardinality Estimates That Matter
Previous approaches to learned cardinality estimation have focused on
improving average estimation error, but not all estimates matter equally. Since
learned models inevitably make mistakes, the goal should be to improve the
estimates that make the biggest difference to an optimizer. We introduce a new
loss function, Flow-Loss, that explicitly optimizes for better query plans by
approximating the optimizer's cost model and dynamic programming search
algorithm with analytical functions. At the heart of Flow-Loss is a reduction
of query optimization to a flow routing problem on a certain plan graph in
which paths correspond to different query plans. To evaluate our approach, we
introduce the Cardinality Estimation Benchmark, which contains the ground truth
cardinalities for sub-plans of over 16K queries from 21 templates with up to 15
joins. We show that across different architectures and databases, a model
trained with Flow-Loss improves the cost of plans (using the PostgreSQL cost
model) and query runtimes despite having worse estimation accuracy than a model
trained with Q-Error. When the test set queries closely match the training
queries, both models improve performance significantly over PostgreSQL and are
close to the optimal performance (using true cardinalities). However, the
Q-Error trained model degrades significantly when evaluated on queries that are
slightly different (e.g., similar but not identical query templates), while the
Flow-Loss trained model generalizes better to such situations. For example, the
Flow-Loss model achieves up to 1.5x better runtimes on unseen templates
compared to the Q-Error model, despite leveraging the same model architecture
and training data
Comyco: Quality-Aware Adaptive Video Streaming via Imitation Learning
Learning-based Adaptive Bit Rate~(ABR) method, aiming to learn outstanding
strategies without any presumptions, has become one of the research hotspots
for adaptive streaming. However, it typically suffers from several issues,
i.e., low sample efficiency and lack of awareness of the video quality
information. In this paper, we propose Comyco, a video quality-aware ABR
approach that enormously improves the learning-based methods by tackling the
above issues. Comyco trains the policy via imitating expert trajectories given
by the instant solver, which can not only avoid redundant exploration but also
make better use of the collected samples. Meanwhile, Comyco attempts to pick
the chunk with higher perceptual video qualities rather than video bitrates. To
achieve this, we construct Comyco's neural network architecture, video datasets
and QoE metrics with video quality features. Using trace-driven and real-world
experiments, we demonstrate significant improvements of Comyco's sample
efficiency in comparison to prior work, with 1700x improvements in terms of the
number of samples required and 16x improvements on training time required.
Moreover, results illustrate that Comyco outperforms previously proposed
methods, with the improvements on average QoE of 7.5% - 16.79%. Especially,
Comyco also surpasses state-of-the-art approach Pensieve by 7.37% on average
video quality under the same rebuffering time.Comment: ACM Multimedia 201
Interpreting Deep Learning-Based Networking Systems
While many deep learning (DL)-based networking systems have demonstrated
superior performance, the underlying Deep Neural Networks (DNNs) remain
blackboxes and stay uninterpretable for network operators. The lack of
interpretability makes DL-based networking systems prohibitive to deploy in
practice. In this paper, we propose Metis, a framework that provides
interpretability for two general categories of networking problems spanning
local and global control. Accordingly, Metis introduces two different
interpretation methods based on decision tree and hypergraph, where it converts
DNN policies to interpretable rule-based controllers and highlight critical
components based on analysis over hypergraph. We evaluate Metis over several
state-of-the-art DL-based networking systems and show that Metis provides
human-readable interpretations while preserving nearly no degradation in
performance. We further present four concrete use cases of Metis, showcasing
how Metis helps network operators to design, debug, deploy, and ad-hoc adjust
DL-based networking systems.Comment: To appear at ACM SIGCOMM 202